Turning DTDs into specialized tree-automata-based schemata to match a collection of marked-up documents
نویسندگان
چکیده
Regular tree automata (RTA) or, equivalently, forest regular grammars (FRG) have been re ently proposed for use as XML (extended markup language) s hemata. They are more powerful than usual XML DTDs (do ument-type definitions), make the implementation, optimization and pruning of XML queries easier and allow for the implementation of ontext-sensitive ontent models. We des ribe a method for the automati generation of a spe ialized RTA-based s hema from a sour e DTD and a sample of marked-up do uments showing ontext-sensitive behaviour in ontent models. It reates the smallest RTA-based s hema with whi h all the XML do uments in the sample omply and whi h does not a ept any do uments not valid a ording to the original DTD. In this way, new les an be reated, parsed, and queried using the spe ialized s hema but still be ompliant with the original DTD. The tool is urrently being tested at the Miguel de Cervantes digital library at the University of Ala ant (http:// ervantesvirtual. om).
منابع مشابه
Evaluating Structural Similarity in XML Documents
XML documents on the web are often found without DTDs, particularly when these documents have been created from legacy HTML. Yet having knowledge of the DTD can be valuable in querying and manipulating such documents. Recent work (cf. [10]) has given us a means to (re-)construct a DTD to describe the structure common to a given set of document instances. However, given a collection of documents...
متن کاملUsing Regular Tree Automata as XML Schemas
We address the problem of tight XML schemas and propose regular tree automata to model XML data. We show that the tree automata model is more powerful that the XML DTDs and is closed under main algebraic operations. We introduce the XML query algebra based the tree automata model, and discuss the query optimization and query pruning techniques. Finally, we show the conversion of tree automata s...
متن کاملEfficient inclusion checking for deterministic tree automata and XML Schemas
We present algorithms for testing language inclusion L(A) ⊆ L(B) between tree automata in time O(|A| · |B|) where B is deterministic (bottom-up or top-down). We extend our algorithms for testing inclusion of automata for unranked trees A in deterministic DTDs or deterministic EDTDs with restrained competition D in time O(|A| · |Σ| · |D|). Previous algorithms were less efficient or less general.
متن کاملTREE AUTOMATA BASED ON COMPLETE RESIDUATED LATTICE-VALUED LOGIC: REDUCTION ALGORITHM AND DECISION PROBLEMS
In this paper, at first we define the concepts of response function and accessible states of a complete residuated lattice-valued (for simplicity we write $mathcal{L}$-valued) tree automaton with a threshold $c.$ Then, related to these concepts, we prove some lemmas and theorems that are applied in considering some decision problems such as finiteness-value and emptiness-value of recognizable t...
متن کاملDTD-driven bilingual document generation
Extensively annotated bilingual parallel corpora can be exploited to feed editing tools that integrate the processes of document composition and translation. Here we discuss the architecture of an interactive editing tool that, on top of techniques common to most Translation Memory-based systems, applies the potential of SGML's DTDs to guide the process of bilingual document generation. Rather ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001